Multiple-Precision Fixed-Point Vector Multiply-Accumulator Using Shared Segmentation

نویسندگان

  • Dimitri Tan
  • Albert Danysh
  • Michael J. Liebelt
چکیده

This paper presents a 64-bit fixed-point vector multiply-accumulator (MAC) architecture capable of supporting multiple precisions. The vector MAC can perform one 64x64, two 32x32, four 16x16 or eight 8x8 bit signed/unsigned multiply-accumulates using essentially the same hardware as a scalar 64-bit MAC and with only a small increase in delay. The scalar MAC architecture is “vectorized” by inserting mode-dependent multiplexing into the partial product generation and by inserting mode-dependent kills in the carry chain of the reduction tree and the final carry-propagate adder. This is an example of "shared segmentation" in which the existing scalar structure is segmented and then shared between vector modes. The vector MAC is area efficient and can be fully pipelined which makes it suitable for high-performance processors and possibly dynamically reconfigurable processors.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Mixed Precision Training of Convolutional Neural Networks using Integer Operations

The state-of-the-art (SOTA) for mixed precision training is dominated by variants of low precision floating point operations, and in particular FP16 accumulating into FP32 Micikevicius et al. (2017). On the other hand, while a lot of research has also happened in the domain of low and mixed-precision Integer training, these works either present results for non-SOTA networks (for instance only A...

متن کامل

Charge-Mode Parallel Architecture for Vector–Matrix Multiplication

An internally analog, externally digital architecture for parallel vector–matrix multiplication is presented. A threetransistor unit cell combines a single-bit dynamic random-access memory and a charge injection device binary multiplier and analog accumulator. Digital multiplication of variable resolution is obtained with bit-serial inputs and bit-parallel storage of matrix elements, by combini...

متن کامل

A Novel Multiply-Accumulator Unit Bus Encoding Architecture for Image Processing Applications

In the CMOS circuit power dissipation is a major concern for VLSI functional units. With shrinking feature size, increased frequency and power dissipation on the data bus have become the most important factor compared to other parts of the functional units. One of the most important functional units in any processor is the Multiply-Accumulator unit (MAC). The current work focuses on the develop...

متن کامل

Multiple Sclerosis Lesions Segmentation in Magnetic Resonance Imaging using Ensemble Support Vector Machine (ESVM)

Background: Multiple Sclerosis (MS) syndrome is a type of Immune-Mediated disorder in the central nervous system (CNS) which destroys myelin sheaths, and results in plaque (lesion) formation in the brain. From the clinical point of view, investigating and monitoring information such as position, volume, number, and changes of these plaques are integral parts of the controlling process this dise...

متن کامل

Efficient Reproducible Floating Point Summation and BLAS

We define reproducibility to mean getting bitwise identical results from multiple runs of the same program, perhaps with different hardware resources or other changes that should ideally not change the answer. Many users depend on reproducibility for debugging or correctness [1]. However, dynamic scheduling of parallel computing resources, combined with nonassociativity of floating point additi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003